c++ jsoncpp中文和\uXXXX使用toStyledString生成字符串中文乱码解决方案

2024-07-05 12:10| 来源: 网络整理| 查看: 265

一、中文乱码解决方法

1.1、乱码展示

1.2、乱码原因及解决方法

二、含有\uXXXX解析乱码的解决方法

2.1、乱码展示

2.2、乱码原因

2.3、解决方法

一、中文乱码解决方法 1.1、乱码展示

在使用jsoncpp解析含有中文的字符串的时候，使用toStyledString()函数生成的字符串中的中文部分将变成\u加4个16进制数字会出现解析乱码的情况。

比如：

1.2、乱码原因及解决方法

jsoncpp的源码来分析（官方下载地址：http://sourceforge.net/projects/jsoncpp/files/ ）。通过分析StyledWriter的writeValue函数发现他对字符串的处理通过valueToQuotedStringN函数进行了转义：

static String valueToQuotedStringN(const char* value, unsigned length) { if (value == nullptr) return ""; if (!isAnyCharRequiredQuoting(value, length)) return String("\"") + value + "\""; // We have to walk value and escape any special characters. // Appending to String is not efficient, but this should be rare. // (Note: forward slashes are *not* rare, but I am not escaping them.) String::size_type maxsize = length * 2 + 3; // allescaped+quotes+NULL String result; result.reserve(maxsize); // to avoid lots of mallocs result += "\""; char const* end = value + length; for (const char* c = value; c != end; ++c) { switch (*c) { case '\"': result += "\\\""; break; case '\\': result += "\\\\"; break; case '\b': result += "\\b"; break; case '\f': result += "\\f"; break; case '\n': result += "\\n"; break; case '\r': result += "\\r"; break; case '\t': result += "\\t"; break; // case '/': // Even though \/ is considered a legal escape in JSON, a bare // slash is also legal, so I see no reason to escape it. // (I hope I am not misunderstanding something.) // blep notes: actually escaping \/ may be useful in javascript to avoid = 0x20) result += static_cast(cp); else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane result += "\\u"; result += toHex16Bit(cp); } else { // codepoint is not in Basic Multilingual Plane // convert to surrogate pair first cp -= 0x10000; result += "\\u"; result += toHex16Bit((cp >> 10) + 0xD800); result += "\\u"; result += toHex16Bit((cp & 0x3FF) + 0xDC00); } //result += *c; }break;

改为：

default: { result += *c; }break;

最终结果为：

参考链接：

c++ jsoncpp使用toStyledString生成字符串中文乱码解决方案

二、含有\uXXXX解析乱码的解决方法 2.1、乱码展示

json文件如下：

解析结果：

2.2、乱码原因

之前改过valueToQuotedStringN函数，这个函数是将字符串转化为unicode编码，所以直接读取\uXXXX格式的字符串得到的其实是utf-8的字符串（如果读的是中文才是unicode编码）。所以这里需要额外的将字符串转化为unicode代码

2.3、解决方法

utf-8转unicode:

wstring UTF8ToUnicode(const string& str) { int len = 0; len = str.length(); int unicodeLen = ::MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, NULL, 0); wchar_t * pUnicode; pUnicode = new wchar_t[unicodeLen + 1]; memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t)); ::MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, (LPWSTR)pUnicode, unicodeLen); wstring rt; rt = (wchar_t*)pUnicode; delete pUnicode; return rt; }

在程序中加入该函数，并调用：

std::string ws2s(const std::wstring& ws) { std::string curLocale = setlocale(LC_ALL, NULL); setlocale(LC_ALL, "chs"); const wchar_t* _Source = ws.c_str(); size_t _Dsize = 2 * ws.size() + 1; char *_Dest = new char[_Dsize]; memset(_Dest, 0, _Dsize); wcstombs(_Dest, _Source, _Dsize); std::string result = _Dest; delete[]_Dest; setlocale(LC_ALL, curLocale.c_str()); return result; } //调用 std::string content = root["Cnki"][i]["content"].toStyledString(); wstring wstr = UTF8ToUnicode(content);//将utf-8转化为unicode格式 cout

【本文地址】

公司简介

联系我们